Computer simulations of language change notes

This website collects my personal notes on Computer simulations of language change. These notes are provided to bring full transparency to my research process. Of course, since they are only notes, they do not reflect my final thoughts on a topic, and should not be interpreted as such. To read finished papers, please consult my website. Do not use these notes as a basis for your own scientific research. Start from high-quality, peer-reviewed scientific literature instead.

The Changing English Language. Psycholinguistic perspectives

this book (p. 1)

brings together historical linguistics and psycholinguistics

Language change as it proceeds from generation to generation through daily interaction of speakers may be shaped by language-internal, social and psycholinguistic factors. While the first two factors have been thoroughly researched in historical linguistics and sociolinguistics, the third has not been as systematically addressed as one might have expected. (p. 1)

Frequency

The Ecclesiastes Principle in Language Change

Introduction

Thus, the name grammar of preindustrial England was more similar to the name grammar of modern Korean than to the present-day name grammar of English (Ramscar et al. 2013c), which is changing rapidly. This can be seen in Figure 2.2, which presents US Social Security registration data on personal names. The horizontal axis represents the time period from 1880 to 2010. The vertical axis presents the entropy of the distribution of personal names for males and females separately. The entropy measure quantifies the amount of uncertainty about a male or female name. Entropies are larger when there are more different names, and when these names have more similar probabilities. Thus, a greater entropy indicates, informally, that it is more difficult to guess a name and that it will take more time to retrieve a name from memory. (p. 23)

name entropy is rising!
using entropy as a measure might be really useful for priming

(p. 24)

The increase in entropy of US given names, 1880–2010.

Language Change over the Lifetime

For words with a log frequency lower than 5 in the British National Corpus, we see a dramatic increase in performance for older versus younger subjects. (p. 25)

What these studies show is that over our lifetime, we slowly but steadily increase our mastery of the vocabulary. This mastery, however, is not restricted to knowledge of more words; it extends to collocational knowledge and to articulatory fluency. (p. 25)

This indicates that the oldest subjects are most sensitive to collocation frequency, whereas the youngest subjects are the least sensitive. This finding is consistent with the fact that older subjects have had more experience with the language and have, as a consequence, become more sensitive to lexical co-occurrence probabilities. (p. 27)

We can see the effects of practice over the lifetime by investigating the fine details of articulatory trajectories for words of different frequencies. The more frequent a word is, the more opportunities a speaker has had to practice her motor system on the articulation of that word, and the more likely it is that we can observe discriminative differentiation as a function of experience. Thus, by studying words of different frequency with respect to how articulatory gestures are executed, we can gain insight into changes that are likely to take place within a given word as a speaker becomes more practiced in uttering that word. (p. 28)

(p. 29)

The upper panels show that the tongue body sensor moves further down when producing the vowel /a:/ when the word has a higher frequency of occurrence. In other words, for higher-frequency words in which the next syllable realizes the third person plural ending (an apical /n/), a higher frequency affords a more precise and distinctive articulation of this low vowel. The lower panels show the pattern in reverse, when a /t/ realizes the second person plural inflection. The more frequent this inflectional variant is, the earlier the tongue starts preparing for the articulation of the (laminal) /t/. (p. 30)

Dit vind ik heel raar. Waarom zou een hoogfrequente vorm juist een verzorgdere uitspraak hebben?

The Ecclesiastes Principle in Language Change

In the third century BC, the philosopher and wisdom teacher Qohelet wrote

For in much wisdom is much grief: and he that increaseth knowledge increaseth sorrow. (Ecclesiastes 1:18, translation King James Version)

This characterization of the human condition applies straightforwardly to human learning. The accumulation of knowledge does not come for free. We refer to this as the Ecclesiastes Principle. (p. 32)

The Ecclesiastes Principle has more subtle, but no less far-reaching, consequences when we consider the details of lexical learning. Above, we examined the performance on the paired associate learning task as a function of collocation frequency and age. We observed that older speakers reveal greater sensitivity to collocation frequency, which fits with our hypothesis that language proficiency increases as experience accumulates over the lifetime. What we have not yet discussed is why it is that older speakers perform less well than younger speakers on the pairs with lower collocation frequencies. (p. 33)

Thus, the Ecclesiastes Principle manifests itself in the context of learning as a force prohibiting the learning of novel knowledge if and only if that novel knowledge does not make sense given prior experience.3 We think this same force may serve to speed the demise of words that are in the process of becoming obsolete. It is not only that the contexts in which words such as telegraph or walkman were once used will become increasingly rare, but the lexical collocates that were once predictive of these words will increasingly lose this predictivity where they continue to be encountered in other contexts. (p. 34)

Entrenchment and the Ecclesiastes Principle

Frequency of occurrence is widely used as a measure of entrenchment in memory (see Hilpert, this volume). However, simple frequency counts do not take into account the effects of co-learning and the costs that accrue with the accumulation of knowledge. (p. 37)

Statistical measures as used in studies of lexicogrammatical attraction (Allan 1980; Stefanowitsch and Gries 2003; Ellis 2006a; Schmid and Küchenhoff 2013) take into consideration that words are used as part of a system (see also Ellis, this volume). However, the 2 × 2 contingency tables on which these measures are calculated require simplified binary contrasts that do not do full justice to the complexity of the language system. (p. 38)

This leaves the analyst with two options. One option is to complement frequency counts with a wide range of other measures, such as burstiness, dispersion, age of acquisition, conditional probabilities given preceding or following words and multiword probabilities (Bannard and Matthews 2008). Baayen (2011b) showed, using multiple regression, that when a wide range of variables correlated with word frequency is taken into account, there is very little variance left for word frequency to explain. Simple counts isolate units such as words from the system of which they are part. The more measures that probe the system are taken into account, the less useful bare frequency counts become. (p. 38)

For example, the word great can be followed by many other words (care, deal, story, about, for, if, on, used, . . .). Whenever great is followed by story, the link between great and story is strengthened, while the links between great and all other words that have been encountered following it are weakened. As a consequence, simply counting frequencies and cooccurrence frequencies will not do justice to the constantly on-going recalibration of the language system with respect to the words that could have appeared, but did not. (p. 39)

Denk ik niet nuttig voor ons nu.

Final remarks

Especially in the domain of lexis, we are faced with the problem that although the highest-frequency words are common knowledge, as we move out into the low-frequency tail of Zipfian word frequency distributions, knowledge fractionates across individuals. Both classical factorial (Carroll and White 1973) and recent crowd-sourcing studies (Keuleers et al. 2015) highlight the specialized, and hence restricted, knowledge of individual language users. But perhaps knowledge specialization is the evolutionary answer to the limits on what an individual member of a community can achieve. From this perspective, the registers, genres and specialist vocabularies appear as just another variation of nature on intra-species variation and eusociality. (p. 46)

More in general, we think it is worth reflecting on parallels between language change within the lifespan of an individual and language change in the course of the history of a given society. It is not the case that by the age of twenty-one, a language has been learned, to remain stationary and unaltered over the remaining lifetime. Over the lifetime, new words and expressions are constantly encountered as speakers read, watch TV, travel to new places with unfamiliar names for streets and buildings, meet new people and buy novel products. This accumulation of experience is unlikely to be uniform across the lexicon and the construction, and we anticipate that trade-offs at individual and aggregate levels, such as the adaptation toward pronouns under onomasiological overload, or the increase in the use of compounds (Scherer, 2005), are more widespread than we can currently imagine. (p. 47)

moeten we dit effect inbouwen in de simulatie?

Frequencies in Diachronic Corpora and Knowledge of Language

Introduction

In linguistics, the term ‘frequency’ is arguably most strongly associated with the idea of text frequency, because of the many studies that testify to its effects: the text frequency with which a linguistic item occurs is an important determinant of how early and how easily that item is learned, how strongly it is mentally represented, and how quickly it can be retrieved from memory, amongst other things (Ellis 2002; Bybee 2010). Yet, it is clear that frequency effects do not only emerge from high text frequency, but also from high type frequency (Bybee and Thompson 1997; de Jong et al. 2000), high frequency of co-occurrence (Jurafsky et al. 2001; Gries et al. 2005) and high frequency in the recent linguistic context (Szmrecsanyi 2006). There is now a sizable literature on frequency effects that also discusses the roles of relative frequency, transitional probabilities and perplexity (see for example the contributions in Gries and Divjak 2012). It will be the goal of this chapter to explore how what is known about frequency effects from psycholinguistic work can be fruitfully transferred to the study of historical language change on the basis of diachronic corpora.

(p. 49-50)

Text frequency

For many historical linguists, developments of this kind are actually interesting in themselves, regardless of any cognitive implications. If an item increases or decreases substantially in frequency, that signals a process of change in the language system, which can be investigated in complete detachment from its speakers and their cognition. However, since text frequency is associated with a range of cognitive correlates, even a simple measurement such as the one shown above has psychological repercussions. Based on what is known about the effects of text frequency in synchronic language use, it can be speculated that for speakers of English during the early nineteenth century, processing the string for want of was slightly different from what it is for present-day speakers. (p. 51)

Chunking

The collocation for want of belongs to the set of formulaic word sequences that speakers learn and memorize as units. As a whole, the phrase conveys idiomatic, non-compositional meaning. Chunking is dependent on text frequency; that is, the more frequently a string of elements is processed together, the greater the likelihood that speakers do not decompose the string into its component parts but instead process it as a single unit (Bybee and Scheibman 1999; Bybee and Moder, this volume). (p. 52)

Entrenchment

A second, closely related effect of text frequency is the strength with which a string such as at the start or for want of is represented in speakers’ minds. This phenomenon is commonly discussed under the heading of entrenchment (Schmid 2010, to appear; Blumenthal-Dramé 2012). Highly entrenched items are processed more quickly and more accurately, and these effects can be explained as a direct consequence of a learning process that is fueled by repeated experience (Ellis 2002: 152, this volume). (p. 52)

Conserving effect of frequency

A third consequence of high text frequency is what Bybee (2006: 715) has called the conserving effect of frequency. Note that the preposition for in for want of conveys the meaning of a cause, rather than a beneficiary (cf. Hundt and Leech 2012 on the decline of causal for). The noun want expresses the lack of something.

[…]

Whereas the words for and want by themselves only maintain very weak associations with these older meanings, the collocation for want of provides a niche in which those meanings are conserved, even in modern usage. Importantly, this kind of retention is an effect of high text frequency during earlier historical stages. If the text frequency of a complex linguistic unit leads to entrenchment, the characteristics of that unit may be preserved, even if structurally similar units are subject to change, and even if that unit itself becomes less frequent as time goes on.

(p. 53)

Conclusion

In summary, text frequencies reflect how familiar speakers at a given historical stage would have been with a linguistic unit. More specifically, high text frequencies relate to the chunking of a complex unit, its strength of mental representation, and its potential to be conserved in the language system over longer periods of time.

Relative frequency

Another type of measurement that is regularly found in historical corpus-based studies is the measurement of relative frequencies, that is, the frequency of one linguistic unit as compared to the frequency of another.

(p. 54)

Relative frequency increase of keep V-ing in TIME.

However, measurements of relative frequencies typically have an underlying psychological assumption, namely that the two or more forms that are being compared form part of a single cognitive category in the minds of speakers. Hence, the display of the frequency development of keep V-ing in Figure 3.2 suggests that some relation between grammatical keep V-ing and other, lexical uses of keep is tacitly assumed. The fact that keep V-ing increases its market share would then indicate that in the overarching category that mentally represents the verb keep, there has been some re-organization. Speakers today entertain somewhat stronger associative ties between keep and a verb in the ing-form than speakers used to do about a century ago.

(p. 54-55)

While incongruous is processed holistically, as a simplex word, speakers tend to process invulnerable analytically, as a morphologically complex word. Hay presents evidence that naïve speakers, when asked to compare incongruous and invulnerable, judge the latter to be more complex, even though the morpheme count of the two words does not differ (Hay 2001: 1049).

The explanation for these differing judgments lies in the relative frequencies of the respective bases and derivatives. Table 3.1 contrasts the base and derivative frequencies of congruous and vulnerable in the BNC. What can be seen is that the derivative incongruous has a very high relative frequency (shown in brackets) vis-à-vis its morphological base, meaning that speakers are much less likely to hear the word congruous on its own, rather than as a part of incongruous. The same is not true of invulnerable: The adjective vulnerable occurs more often on its own than as part of the derivative invulnerable.

(p. 55)

Reduction effects are thus not only a result of high text frequency (cf. Section 3.2). Even infrequent words may be reduced, as long as they are predictable enough. In environments where words are highly predictable, speakers subconsciously estimate that the risk of misunderstanding is low and therefore they permit themselves a less effortful pronunciation. Diachronically, this can lead to word coalescence and phonological erosion, as for example in gonna, kinda or wouldya. (p. 56)

Type frequency

The type frequency of a linguistic unit is measured as the number of different variants in which that unit appears in a given corpus. (p. 56)

In the light of what the preceding paragraphs have discussed, type frequencies and hapax legomena are intimately related to the mental representations of linguistic schemas that speakers have at their disposal for the formation of new coinages and phrasings. (p. 59-60).

Burstiness/dispersion

The dispersion of a linguistic unit reflects how evenly it is distributed across the parts of a corpus. If a corpus is divided up into 1,000-word chunks, how many of those contain the unit in question? The burstiness of a linguistic unit captures how regular the intervals are at which the unit appears in a running text. Items with high burstiness appear in dense groups, i.e. ‘bursts’, at unpredictable intervals. If such a word appears once in a text, chances are high that it will appear again soon, perhaps already in the next sentence, but after a few such occurrences it could be completely absent from the next fifty pages. Once five sentences have gone by without any occurrence of the word, the burst can be assumed to be over. By contrast, items with low burstiness occur regularly and individually at dependable intervals. They are the steady linguistic companions of any speaker or writer. (p. 60)

Differences in burstiness, such as between without (low) and London (high), have further been linked to semantic differences, specifically to a hierarchy of increasingly abstract meanings (Pierrehumbert 2012: 104). The more abstract the meaning of a word is, the lower its degree of burstiness. The explanation for this is straightforward: the more general the meaning of a word is, the more contexts are available for its use, and the more regular are its appearances. (p. 62)

Altmann et al. (2009: 5) contrast four different types of words (entities, predicates, modifiers and highlevel operators) and find that with increasing abstractness, burstiness recedes. However, this effect is mitigated by text frequency. Differences in burstiness are more pronounced in low-frequency items, whereas for highly frequent items, which are less bursty in general, differences in abstractness do not yield a strong effect. (p. 62)

Behavioral Profile Frequency

The bottom line of these observations is that the linguistic competence of speakers must include probabilistic knowledge of variation, that is, knowledge of the variants that instantiate a given linguistic unit, and knowledge of the contexts in which these variants are appropriately used. Variation can not only be observed in sychrony, but also diachronically, since the conventions of language use are gradually shifting. (p. 64)

the behavioral profile of a linguistic unit such as a presentational object relative clause (That’s the one that I want) would be an inventory of features such as the ones that are presented in Figure 3.5 below, along with their respective text frequencies, type frequencies and frequencies of mutual co-occurrence. (p. 54)

Concluding remarks

Priming

Priming and Language Change

Introduction

they are much more commonly the result of priming – a largely non-conscious or automatic tendency to repeat what one has comprehended or produced. Much of the time, priming causes people to decide between already-available alternatives, but it can also lead people to use new words, expressions or constructions, and moreover to remember them for subsequent use. If this process occurs in a population rather than in an individual, it should lead to language change. (p. 173)

Psychological Mechanisms of Language Change

To achieve such alignment of situation models, interlocutors do not extensively reason about each other’s mental states, but rather align at other linguistic (and indeed non-linguistic) levels of representation, such as choice of words, pronunciation and grammar, in a largely automatic fashion. Essentially, interlocutors prime each other to speak about things in the same way, and people who speak about things in the same way are more likely to think about them in the same way as well. (p. 174)

A rather separate tradition has considered the extent to which people tend to produce utterances with the same linguistic structure as an utterance that they have just produced or comprehended. Most of this work is concerned with syntax and is referred to as syntactic or structural priming. Bock (1986) found that participants tended to produce sentences such as passives after having produced another otherwise unrelated passive, without being aware of the relationship between the sentences. This suggested that the linguistic representations associated with passives were primed, just as words can be primed. (p.174)

Alignment

Structural Priming

there is a strong tendency to repeat syntax in dialogue. For cross-speaker priming to play a major role in language change, it must be widespread and have large effects, and this appears to be the case. (p. 177)

Levelt and Kelter (1982) asked Dutch shopkeepers Om hoe laat gaat uw winkel dicht? (‘At what time does your shop close?’) or Hoe laat gaat uw winkel dicht? (‘What time does your shop close?’). In the former case, replies tended to include the preposition (e.g. Om vijf uur, ‘At five o’clock’); in the latter, replies tended to exclude the preposition (e.g. Vijf uur, ‘Five o’clock’). Perhaps surprisingly, this effect did not persist when a clause intervened between prime and target.

(p. 177)

Long-term priming

Language change of course also requires long-term effects of priming. The emphasis of most priming work, and the theoretical account of alignment due to Pickering and Garrod (2004), focuses largely on shortterm effects, either from one utterance to the next or within the context of an individual conversation. However, long-term priming does occur. (p. 178)

More strikingly, Kaschak et al. (2011) found that priming persisted over about a week. In their study, one group of participants completed (written) sentence stems that were designed to elicit a prepositional object completion, and another group completed stems that were designed to elicit a double-object completion. Participants returned to the laboratory a week later and were presented with stems that could be completed with either construction. They tended to persist in the construction that they used. Such effects may require that the experimental task stay the same across sessions (Kaschak, Kutta and Coyle 2014). (p. 178-179)

Priming Ungrammatical Structures

Kaschak and Glenberg (2004) found that reading times for a construction that is ungrammatical in Standard English (the needs-construction, as in The meal needs cooked) decreased with consecutive presentations. These results generalized across modalities (spoken to written language) and to different verbs (Kaschak and Glenberg 2004) and sentential contexts (Kaschak 2006). In addition, Luka and Barsalou (2005) showed that grammaticality ratings for moderately ungrammatical sentences (e.g. Armanda carried Fernando the package or Rachel needs to get a tattoo as colourful as Bob has) were higher for those participants who had read them previously than for those who saw them for the first time. (p. 179)

Boosts to Priming

Branigan et al. (2000) found that the participant repeated the confederate’s choice of construction around 77 percent of the time when the verb was repeated, and over 63 percent of the time when the verb differed. The enhanced tendency to repeat grammar in the context of lexical (verb) repetition is known as the lexical boost and supports the claim of the interactive-alignment account that repetition at one linguistic level (the lexicon) leads to repetition at another level (grammar). (p. 180)

We should note, however, that the lexical boost appears to be largely or entirely short-lived. Hartsuiker et al. (2008) found a strong boost when prime and target were adjacent (lag 0) but not when they were separated (lag 2 or lag 6). This is perhaps surprising, because a long-term combination of syntactic form and lexical content would have been compatible with the establishment of routines containing fixed words and syntax. It is possible that long-term lexically specific priming does occur under some circumstances (and indeed Kaschak and Borreggine 2008 reported such an effect for one verb, lend). But the relationship between the lexical boost to priming and the establishment of routines remains unclear. (p. 180-181)

Corpus Work

Priming in Children

Priming and Alignment between Languages

Routinization

Most discussion of routines refers to the long-term development of fixed expressions that come to behave like words (e.g. Aijmer 1996a; Kuiper 1996; Nunberg et al. 1994; Bybee 2006). But we propose that they often originate in the context of a particular interchange. If one speaker starts to use an expression and gives it a particular meaning, the other will most likely follow suit – clearly an effect of priming. Thus routines are set up ‘on the fly’ during conversation. (p. 183)

We now consider the implications of routinization for language change. A key issue in the study of language change is explaining how changes in the language can spread within and across generations of speakers. Kirby (1999) refers to this as the problem of linkage. In biological evolution, linkage occurs through the inheritance of genes from one generation to the next. The traditional linguistic analogy is to explain linkage through the passing down of a language from one generation to the next during its acquisition (see Lieven, Chapter 14 of this volume; López-Couso, Chapter 15 of this volume). It is then assumed that language change is determined by constraints (which Kirby calls the linguistic bottleneck) that apply to the language-learning mechanism. However, interactive alignment and routinization offer an alternative linkage mechanism associated with language use. In the same way that experimental communities of speakers establish their own routines over the course of repeated interactions, so real communities of speakers can establish and maintain routines as well.

Conclusions

Computer simulations of language change notes